Goto

Collaborating Authors

 lexicon entry


Supplementary Material for Grammar-Based Grounded Lexicon Learning

Neural Information Processing Systems

In the supplementary material, we describe the domain specific languages used in our experiments (Section 1), demonstrate how the proposed CKY-E2 method works by a concrete example (Section 2.1), show formal properties of CKY-E2 (Section 2.2), present dataset setups and analyze model behaviors (Section 3), and list environmental details for experiments (Section??). In this section, we will present and discuss the domain-specific languages (DSLs) we use for two domains: visual reasoning and language-guided navigation. We will further introduce the neurosymbolic module we have designed for executing programs in these two domains. Overall, each DSL contains a set of types and a set of deterministic modules that have been manually designed for realizing necessary operations in these domains. However, in contrast to realizing them as we do in standard programming languages (with for-loops and if-conditions), we will be using tensor operations (e.g., tensor additions and multiplications) to realize them so that the output of each program is differentiable with respect to all of its inputs. We refer readers to the original papers for a detailed introduction to the DSL and neuro-symbolic program execution. Here we only highlight the key aspects of our language and its neuro-symbolic realization, and discuss the difference between our implementation and the ones in original papers. Our visual reasoning DSL is a subset of CLEVR, containing 6 types and 8 primitive operations. Table 1 illustrates all 6 types and how they are internally represented in neuro-symbolic execution. Table 2 further shows all operations in the DSL. There are two main differences between the DSL used by G2L2 and the original CLEVRDSL.





Grammar-Based Grounded Lexicon Learning

Neural Information Processing Systems

We present Grammar-Based Grounded Language Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form $\lambda x.\textit{filter}(x, \textbf{SHINY})$, where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs. To facilitate learning in an exponentially-growing compositional space, we introduce a joint parsing and expected execution algorithm, which does local marginalization over derivations to reduce the training time. We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words.


Grammar-Based Grounded Lexicon Learning

Neural Information Processing Systems

We present Grammar-Based Grounded Language Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form \lambda x.\textit{filter}(x, \textbf{SHINY}), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs.


Grammar-Based Grounded Lexicon Learning

Neural Information Processing Systems

We present Grammar-Based Grounded Language Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form \lambda x.\textit{filter}(x, \textbf{SHINY}), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs.


Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs

arXiv.org Artificial Intelligence

The paper describes a system that uses large language model (LLM) technology to support the automatic learning of new entries in an intelligent agent's semantic lexicon. The process is bootstrapped by an existing non-toy lexicon and a natural language generator that converts formal, ontologically-grounded representations of meaning into natural language sentences. The learning method involves a sequence of LLM requests and includes an automatic quality control step. To date, this learning method has been applied to learning multiword expressions whose meanings are equivalent to those of transitive verbs in the agent's lexicon. The experiment demonstrates the benefits of a hybrid learning architecture that integrates knowledge-based methods and resources with both traditional data analytics and LLMs.